Skip to content

issue/1090: QY机器添加flash attention#1099

Open
xgqdut2016 wants to merge 4 commits intomainfrom
issue/1090
Open

issue/1090: QY机器添加flash attention#1099
xgqdut2016 wants to merge 4 commits intomainfrom
issue/1090

Conversation

@xgqdut2016
Copy link
Collaborator

@xgqdut2016 xgqdut2016 commented Mar 20, 2026

c5ad7fde-f215-44e2-b585-906766deb153

@xgqdut2016 xgqdut2016 requested a review from a team March 20, 2026 07:29
} else if (device.getType() == Device::Type::CPU) {
return at::Device(at::kCPU);
} else if (device.getType() == Device::Type::QY) {
return at::Device(at::kCUDA, device.getIndex());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这代码nv能编译吗


local INFINI_ROOT = os.getenv("INFINI_ROOT") or (os.getenv(is_host("windows") and "HOMEPATH" or "HOME") .. "/.infini")

local FLASH_ATTN_QY_CUDA_SO_CONTAINER_DEFAULT =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要用hard code 的路径

auto v_cache = infinicore::adaptor::to_aten_tensor(p->v_cache);
auto seqlens_k = std::optional<const at::Tensor>(infinicore::adaptor::to_aten_tensor(p->seqlens_k));
auto block_table = std::optional<at::Tensor>(infinicore::adaptor::to_aten_tensor(p->block_table));
// FlashAttention kernels expect standard dense layout (contiguous last dimension).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改代码涉及其他平台时,需要重启一个编译选项,单独修改,不能影响原有其他平台代码

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants